Combining feature norms and text data with topic models.
نویسنده
چکیده
Many psychological theories of semantic cognition assume that concepts are represented by features. The empirical procedures used to elicit features from humans rely on explicit human judgments which limit the scope of such representations. An alternative computational framework for semantic cognition that does not rely on explicit human judgment is based on the statistical analysis of large text collections. In the topic modeling approach, documents are represented as a mixture of learned topics where each topic is represented as a probability distribution over words. We propose feature-topic models, where each document is represented by a mixture of learned topics as well as predefined topics that are derived from feature norms. Results indicate that this model leads to systematic improvements in generalization tasks. We show that the learned topics in the model play in an important role in the generalization performance by including words that are not part of current feature norms.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملAn approach to rank efficient DMUs in DEA based on combining Manhattan and infinity norms
In many applications, discrimination among decision making units (DMUs) is a problematic technical task procedure to decision makers in data envelopment analysis (DEA). The DEA models unable to discriminate between extremely efficient DMUs. Hence, there is a growing interest in improving discrimination power in DEA yet. The aim of this paper is ranking extreme efficient DMUs in DEA based on exp...
متن کاملA General Investigation on the Combination of Local and Global Feature Selection Methods for Request Identification in Telegram
Nowadays, the use of various messaging services is expanding worldwide with the rapid development of Internet technologies. Telegram is a cloud-based open-source text messaging service. According to the US Securities and Exchange Commission and based on the statistics given for October 2019 to present, 300 million people worldwide used telegram per month. Telegram users are more concentrated in...
متن کاملEfficient Method Based on Combination of Deep Learning Models for Sentiment Analysis of Text
People's opinions about a specific concept are considered as one of the most important textual data that are available on the web. However, finding and monitoring web pages containing these comments and extracting valuable information from them is very difficult. In this regard, developing automatic sentiment analysis systems that can extract opinions and express their intellectual process has ...
متن کاملGeographic Feature Type Topic Model (GFTTM): grounding topics in the landscape
7 Probabilistic topic models are a class of unsupervised machine learning models used for understanding the latent topics in a corpus of documents. A new method for combining geographic feature data with text from geo-referenced documents to create topic models that are grounded in the physical environment is proposed. The Geographic Feature Type Topic Model (GFTTM) models each document in a co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Acta psychologica
دوره 133 3 شماره
صفحات -
تاریخ انتشار 2010